Improvements and Generalizations of Stochastic Knapsack and Multi-Armed Bandit Approximation Algorithms: Full Version

نویسنده

Will Ma

چکیده

The multi-armed bandit (MAB) problem features the classical tradeoff between exploration and exploitation. The input specifies several stochastic arms which evolve with each pull, and the goal is to maximize the expected reward after a fixed budget of pulls. The celebrated work of Gittins et al. [GGW89] presumes a condition on the arms called the martingale assumption. Recently, A. Gupta et al. obtained an LP-based 1 48 -approximation for the problem with the martingale assumption removed [GKMR11]. We improve the algorithm to a 4 27 -approximation, with simpler analysis. Our algorithm also generalizes to the case of MAB superprocesses with (stochastic) multi-period actions. This generalization captures the framework introduced by Guha and Munagala in [GM07a, GM07b], and yields new results for their budgeted learning problems. Also, we obtain a ( 2 −ε)-approximation for the variant of MAB where preemption (playing an arm, switching to another arm, then coming back to the first arm) is not allowed. This contains the stochastic knapsack problem of Dean, Goemans, and Vondrák [DGV08] with correlated rewards, where we are given a knapsack of fixed size, and a set of jobs each with a joint distribution for its size and reward. The actual size and reward of a job can only be discovered in real-time as it is being scheduled, and the objective is to maximize expected reward before the knapsack size is exhausted. Our ( 2 −ε)-approximation improves the 1 16 and 1 8 approximations of [GKMR11] for correlated stochastic knapsack with cancellation and no cancellation, respectively, providing the first tight algorithm for these problems that matches the integrality gap of 2. We sample probabilities from an exponential-sized dynamic programming solution, whose existence is guaranteed by an LP projection argument. We hope this technique can also be applied to other dynamic programming problems which can be projected down onto a small LP.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvements and Generalizations of Stochastic Knapsack and Multi-Armed Bandit Algorithms: Extended Abstract

The celebrated multi-armed bandit (MAB) problem, originating from the work of Gittins et al. [GGW89], presumes a condition on the arms called the martingale assumption. Recently, A. Gupta et al. obtained an LP-based 1 48 -approximation for the problem with the martingale assumption removed [GKMR11]. We improve the algorithm to a 4 27 -approximation, with simpler analysis. Our algorithm also gen...

متن کامل

Anytime optimal algorithms in stochastic multi-armed bandits

We introduce an anytime algorithm for stochastic multi-armed bandit with optimal distribution free and distribution dependent bounds (for a specific family of parameters). The performances of this algorithm (as well as another one motivated by the conjectured optimal bound) are evaluated empirically. A similar analysis is provided with full information, to serve as a benchmark.

متن کامل

Time-Constrained Restless Bandits and the Knapsack Problem for Perishable Items

Motivated by a food promotion problem, we introduce the Knapsack Problem for Perishable Items (KPPI) to address a dynamic problem of optimally filling a knapsack with items that disappear randomly. The KPPI naturally bridges the gap and elucidates the relation between the pspace-hard restless bandit problem and the np-hard knapsack problem. Our main result is a problem decomposition method resu...

متن کامل

Time-Constrained Restless Bandits and the Knapsack Problem for Perishable Items (Extended Abstract)

متن کامل

On Index Policies for Restless Bandit Problems

In this paper, we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivial factor, and little progress has been made on this problem despite its significance in modeling...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Improvements and Generalizations of Stochastic Knapsack and Multi-Armed Bandit Approximation Algorithms: Full Version

نویسنده

چکیده

منابع مشابه

Improvements and Generalizations of Stochastic Knapsack and Multi-Armed Bandit Algorithms: Extended Abstract

Anytime optimal algorithms in stochastic multi-armed bandits

Time-Constrained Restless Bandits and the Knapsack Problem for Perishable Items

Time-Constrained Restless Bandits and the Knapsack Problem for Perishable Items (Extended Abstract)

On Index Policies for Restless Bandit Problems

عنوان ژورنال:

اشتراک گذاری